-
Notifications
You must be signed in to change notification settings - Fork 200
Integrate Coral Type System for Hive Tables #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * Two-stage conversion: Hive → Coral → Calcite. | ||
| * This is the preferred path when using CoralCatalog. | ||
| */ | ||
| private RelDataType getRowTypeViaCoralTypeSystem(RelDataTypeFactory typeFactory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need this. I would imagine both Hive and Iceberg tables convert their types to Coral type as part as part their table API implementation. Internally we have common conversion between Coral and Calcite. IN this step, it does not matter we are coming from Hive or Iceberg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are checking in hive table -> coral -> calcite type system prior to introducing the iceberg type system to roll out the coral type system incrementally. so this intermediate state is required.
a future refactor will be applied in #556 when we introduce iceberg type system and those conversions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't the radius of impact of this be too large? I would keep current constant as much as possible and introduce new features on new things, then we can flip the flag for "existing" things.
If the objective is to validate, can we we continue to use the old implementation, but also compare in the background and log a non-fatal error when there is a mismatch? This way we will know from the log instead of from the user :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this layer of coral, or for that matter coral by itself, is unaware of "new" vs "old", so it needs to be a switch at some point. Building out awareness of "new features" in coral seems like a good product investment in general, but let's pursue that outside of this PR. These changes will get rolled out with #556, Im trying to establish a checkpoint in the middle for closer monitoring.
We're rolling out this change with a fallback, regression testing across all code paths and i-tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean by Coral is not aware of "new" vs "old". The PR's objective is to introduce "old" and "new", so both are known to Coral. Here is what I meant.
public RelDataType getRowType(RelDataTypeFactory typeFactory) {
RelDataType hiveType = getRowTypeFromHiveType(typeFactory);
try {
RelDataType coralType = getRowTypeFromCoralType(typeFactory);
if (!hiveType.equals(coralType)) {
LOG.warn(
"Hive and Coral type conversion mismatch for table {}. " +
"Hive: {}, Coral: {}",
hiveTable.getTableName(),
hiveType,
coralType
);
// Optional: throw a non-fatal exception
throw new RuntimeException(
"Type conversion mismatch between Hive and Coral for table " +
hiveTable.getTableName()
);
}
} catch (Exception e) {
LOG.warn(
"Coral validation failed for table {}. Proceeding with Hive type. Error: {}",
hiveTable.getTableName(),
e.getMessage(),
e
);
}
return hiveType; // Always return Hive path
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can accommodate this change. just one additional clarification, how are we thinking of accommodating this approach with HiveCoralTable & IcebergCoralTable with #556 ?
HiveCoralTable implements CoralTable {
public CoralSchema getSchema() {
// integrate coral type system
}
and internally in HiveTable.java, for hive tables, we use the same approach - prefer getRowTypeFromHiveType() over getRowTypeFromCoralType() . but for iceberg tables, we prefer getRowTypeFromCoralType() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us discuss the strategy offline and update here. There are a few options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this PR, PTAL
What changes are proposed in this pull request, and why are they necessary?
This PR completes the integration of the Coral type system (introduced in #558) by implementing the glue logic for two-stage schema conversion: Hive → Coral → Calcite. This architectural change enables better type system abstraction while maintaining 100% backward compatibility with existing behavior.
Key Changes
HiveToCoralTypeConverter: Converts Hive TypeInfo → Coral data typesCoralTypeToRelDataTypeConverter: Converts Coral data types → Calcite RelDataTypeBoth converters have comprehensive unit test coverage in
HiveToCoralTypeConverterTestHiveTable.getRowType()New path (shadow mode):
getRowTypeViaCoralTypeSystem()implements Hive → Coral → Calcite pipelineLegacy path (production path):
getRowTypeDirectConversion()preserves original TypeConverter logicLogging: Warns on mismatch or failure (zero production impact)
New method: getCoralSchema() for Hive → Coral schema conversion
TIMESTAMPprecision: UsesPRECISION_NOT_SPECIFIED(-1) to match legacy behavior (no explicit precision)UNIONtypes: Generates {tag, field0, field1, ...} struct matching Trino's pattern (required by coalesce_struct UDF)VOIDtype: Maps toSqlTypeName.NULL(not STRING)Nullability: All types default to nullable per Hive semantics
How was this patch tested?
Rollout
This PR: Shadow mode - validate with zero risk
Follow up PR: Switch to Coral as primary after log validation